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The  two  main  goals  of  this  research  effort  were  to  study  novel  approaches  to  array  signal 
processing  and  construct  an  optimal  theory  of  radar  range-Doppler  imaging.  These  research  goals 
were  met  by: 

1.  Developing  fast,  yet  computationally  efficient,  adaptive  beamforming  techniques  that  exploit 
the  wavelet  domain  structure  of  the  observed  field.  Adaptive  beamforming  and  filtering  algo¬ 
rithms  display  slow  convergence  when  the  correlation  matrix  of  blocks  of  the  input  process  is 
highly  ill-conditioned.  Our  new  approaches  compute  an  adaptive  transformation  of  the  data 
to  ensure  that  the  transformed  data  has  a  well- conditioned  correlation  matrix.  The  adaptive 
transformations  are  computed  either  via  a  fast  sparse  Cholesky  factorization  of  the  estimated 
correlation  matrix  of  the  wavelet  transform  of  the  input  data,  or  via  a  sparse  QR  factorization 
of  the  wavelet  domain  input  data  matrix. 

2.  Constructing  a  new  methodology  for  optimal  design  and  operation  of  fast  high  resolution  range- 
Doppler  imaging  radars,  including  synthetic  aperture  and  inverse  synthetic  aperture  radars 
(SAR  and  ISAR ).  Current  radar  imaging  approaches  are  based  on  a  radar  mode  of  operation 
that  is  designed  for  the  detection  of  point  targets.  As  a  consequence,  images  obtained  in  this 
mode  of  operation  are  filtered  low  resolution  rendering  of  the  actual  range-Doppler  reflectivity 
map.  We  have  shown  that  for  optimal  imaging  of  a  given  target  range-Doppler  reflectivity 
function  with  N  waveforms,  one  must  use  any  N  waveforms  that  form  an  orthogonal  basis 
for  a  space  of  waveforms  that  is  matched  to  the  target  range-Doppler  reflectivity  function. 
We  have  developed  a  novel  optimal  waveform  selection  strategy  for  the  case  where  the  radar 
targets  to  be  imaged  are  known  to  belong  to  one  of  two  classes.  We  have  also  identified  an 
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extension  of  this  approach  to  the  case  where  the  target  can  belong  to  more  than  two  classes.  A 
complete  study  of  the  numerical  and  performance  issue  involved  in  that  extension  is  currently 
under  way. 

In  the  remainder  of  this  report  we  provide  a  brief  description  of  the  results  that  we  have 
obtained  under  this  grant.  These  results  fall  into  two  distinct  areas:  fast  wavelet  domain  adaptive 
filtering  procedures  and  waveform  selection  for  optimal  radar  range-Doppler  imaging.  More  detailed 
descriptions  of  the  results  that  we  review  here  may  be  found  in  a  number  of  publications  that  have 
appeared  in  the  literature  or  are  currently  under  review  or  preparation.  See,  e.g.,  [1],  [2],  [3],  [4]. 

2  Wavelet  domain  adaptive  FIR  filtering  and  beamforming 

Adaptive  Finite  Impulse  Response  (FIR)  filters  are  widely  used  in  a  variety  of  signal  processing 
applications  such  as  adaptive  noise  cancelation,  channel  equalization,  system  identification  and  echo 
cancelation.  Typically  in  these  applications,  filters  are  implemented  in  the  time  domain  as  a  tapped 
delay  line  filter.  The  filter  lengths  are  usually  chosen  to  be  very  large,  to  achieve  the  desired  level 
of  performance  [5].  An  alternative  to  using  such  long  adaptive  FIR  filters  is  to  implement  adaptive 
Infinite  Impulse  Response  (HR)  filters.  Though  HR  filters  require  a  much  smaller  number  of 
coefficients  than  FIR  filters,  adaptive  HR  filters  are  not  widely  used  due  to  problems  of  instability, 
slow  convergence  and  local  minima  [5]. 

The  most  popular  algorithm  used  to  adapt  the  parameters  of  FIR  filters  is  the  Widrow-Hopf 
LMS  algorithm.  Its  popularity  is  due  to  its  low  computational  complexity  and  its  robustness  to 
implementation  errors.  Consider  an  N  tap  adaptive  filter.  Let  be  the  tap  of  the  filter  at 

time  n.  Denote  by  x(n ),  d{n)  and  y(n )  the  input  to  the  filter,  the  desired  response  and  the  output 
of  the  adaptive  filter  at  time  n  respectively.  Furthermore,  denote  by  e(n)  the  output  error,  i.e., 
e(n)  =  d{n )  -  z(n).  Note  that  z(n )  is  given  by 

z(n )  =  xT(n)h(n).  (2-1) 

Here  x(n)  is  the  input  signal  vector  and  h(n)  is  the  adaptive  weight  vector,  i.e., 

x(n)  =  [x(n),x(n  -  1), •••,*(«  -  A  +  1)]T,  (2-2) 

h(n)  =  [h0(n),hi(ra),---,/i7v-i(o)]:r,  (2-3) 

where  (-)7  denotes  the  transpose  operator.  The  standard  LMS  algorithm  updates  h(n)  using  the 
recursion  [6] 

h(»  +  1)  =  h(n)  -  2/xV*  =  h(n)  +  2/ix(n)e(n),  (2.4) 

where  fj,  is  called  the  step  size. 

The  convergence  properties  of  the  LMS  algorithm  are  well  known  [6],  [7].  In  particular,  the 
main  drawback  is  its  convergence  speed.  Let  Rr  be  the  autocorrelation  matrix  of  the  input  vector 
x(rc),  i.e., 

Kx  =  £[x(n)xT(n)]. 

For  a  fixed  step  size  /i,  the  convergence  speed  of  the  LMS  algorithm  depends  on  the  condition 
number,  x(Rr)>  °f  this  input  autocorrelation  matrix  [7].  When  all  the  eigenvalues  of  the  input 
correlation  matrix  are  equal,  i.e.,  the  condition  number  x(^)  =  1?  the  algorithm  converges  fastest. 
As  the  eigenvalue  spread  x(Rr)  increases  (i.e.,  as  the  input  correlation  matrix  becomes  more  ill 
conditioned),  the  algorithm  converges  more  slowly. 
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To  improve  the  convergence  properties  of  the  algorithm,  we  can  transform  the  input  vector 
so  that  the  input  correlation  matrix  in  the  transformed  domain  has  a  lower  eigenvalue  spread.  In 
particular,  if  R*  is  non-singular,  we  can  whiten  the  input  process  x(n)  by  pre- multiplying  x(n)  with 
the  matrix  G"1,  where  Gx  is  the  lower  triangular  Cholesky  factor  of  the  autocorrelation  matrix 


GxGj  —  Rx. 


(2.5) 


This  is  equivalent  to  using  an  LMS-Newton  type  algorithm  [6].  However,  the  main  drawback  of 
this  approach  is  that  it  requires  0(N2)  flops  and  therefore  is  computationally  expensive. 

Whitening,  of  course,  cannot  be  used  when  R*  is  singular.  In  such  a  case,  we  can  use  an 
alternative  procedure  that  we  describe  in  [8]. 

The  increase  in  computational  complexity  due  to  whitening  may  be  alleviated  if  the  autocor¬ 
relation  matrix  Rx  has  some  special  sparse  structure.  In  particular,  if  the  correlation  matrix  is 
diagonal,  this  increase  is  minimal.  This  observation  has  motivated  the  development  of  a  variety  of 
transform  domain  algorithms,  e.g.,  [9]  -  [11]  (for  a  more  complete  list  of  work  in  transform  domain 
filtering  refer  to  [5]),  and  more  recently  [12]  -  [14].  By  properly  transforming  the  input  prior  to  the 
implementation  of  the  LMS  algorithm,  one  hopes  that  the  input  to  the  adaptive  FIR  filter  would 
have  a  nearly  diagonal  correlation  matrix  in  the  transform  domain.  A  LMS-Newton  type  filter 
can  then  be  implemented  in  the  transform  domain  with  much  smaller  computational  burden  and 
improved  convergence  performance.  For  example,  if  the  original  autocorrelation  matrix  is  Toeplitz 
with  Hermitian  symmetry,  when  the  matrix  size  is  large  (large  filter  lengths),  it  can  be  approx¬ 
imated  by  a  circulant  matrix  [10].  Therefore,  a  discrete  Fourier  transform  (DFT)  (implemented 
using  fast  Fourier  transform  (FFT)  algorithms)  or  a  discrete  cosine  transform  (DCT),  will  nearly 
diagonalize  the  correlation  matrix.  The  cost  of  such  a  transformation  is  a  minimal  0{N  log  N) 
flops.  Adaptive  filtering  in  the  sub-bands  has  also  been  discussed  to  improve  the  convergence  speed 
with  low  computational  burden  [15]. 

It  should  be  noted  that  the  use  of  these  various  orthogonal  transforms  will  not  result  in  an 
optimal  convergence  speed  for  all  types  of  input  signals.  The  degree  to  which  a  given  transformation 
succeeds  in  near-diagonalizing  an  autocorrelation  matrix  obviously  depends  on  the  type  of  input 
signal.  For  example,  the  DCT  is  known  to  approximate  the  Karhunen-Loeve  Transform  (KLT) 
when  the  signal  is  a  low  pass  process  but  is  known  to  perform  poorly  when  the  signal  is  a  high  pass 
process  [11],  [16]. 

We  developed  two  novel  fast  wavelet  domain  Newton  type  LMS  algorithms.  These  algorithms 
do  not  attempt  to  diagonalize  the  correlation  matrix  of  the  input  process.  Rather,  they  exploit 
the  special  sparse  structure  of  the  wavelet  transform  of  wide  classes  of  correlation  matrices  and 
that  of  the  Cholesky  factors  of  these  matrices  [17].  Specifically,  the  algorithms  identify  a  whitening 
transformation  in  the  wavelet  domain  to  minimize  computational  complexity.  As  a  result,  they  take 
fewer  computations  to  identify  this  transformation  than  to  compute  a  KLT  or  update  an  estimate 
of  the  inverse  of  the  input  covariance  matrix.  The  procedures  estimate  structured  estimates  of  the 
wavelet  domain  correlation  matrix  of  the  input  process.  They  differ  in  the  exact  estimates  they 
use  and  in  the  way  they  identify  the  data  dependent  transformation.  The  first  approach  explicitly 
computes  a  sparse  estimate  of  the  wavelet  domain  correlation  matrix  of  the  input  process.  It 
then  computes  the  Cholesky  factor  of  this  matrix  and  uses  its  inverse  to  whiten  the  input.  The 
complexity  of  this  approach  is  0(Alog2(Ar)).  In  contrast,  the  second  approach  computes  a  sparse 
approximation  to  the  Cholesky  factor  of  the  wavelet  domain  correlation  matrix  of  the  input  process 
directly .  It  does  so  by  computing  a  sparse  QR  decomposition  of  the  wavelet  domain  data  matrix. 
While  this  approximation  does  not  seem  to  converge  to  the  true  sparse  Cholesky  factor  of  the 
wavelet  domain  input  covariance  matrix,  it  is  effective  in  improving  the  condition  number  of  the 
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input  covariance  matrix.  The  complexity  of  this  second  approach  is  0(N  log(JV)).  Its  computational 
complexity  is,  therefore,  no  higher  than  that  of  the  DCT  or  the  FFT  based  transform  domain  LMS 
algorithms.  Yet,  its  performance  is  vastly  superior  to  that  of  these  latter  algorithms.  Thus,  both 
techniques  lead  to  low  complexity  LMS  algorithms  with  a  convergence  performance  similar  to  that 
of  the  more  complex  KLT  based  or  Newton  type  LMS  algorithms. 

Note  that  the  discrete  wavelet  transform  (DWT)  provides  a  good  approximation  to  the  KLT  of 
several  processes,  [17]  -  [18].  The  sparse  structure  of  the  DWT  of  wide  classes  of  matrices  has  been 
used  to  speed  up  matrix-matrix  and  matrix-vector  multiplications  [19],  [20]  and  positive  definite 
matrix  inversion  [17].  They  have  also  been  used  to  derive  fast  estimation  procedures  [21]. 

LMS  type  algorithms  in  the  wavelet  transform  domain  have  been  suggested  recently  [12]  -  [14]. 
However,  these  algorithms  do  not  exploit  the  structure  of  the  wavelet  transform  of  a  correlation 
matrix.  They  make  the  assumption  that  the  correlation  matrix  in  the  wavelet  transform  domain 
is  nearly  diagonal.  Unfortunately,  this  is  not  always  the  case  (see  Section  5  in  [1]  for  examples). 
Therefore,  the  behavior  of  these  algorithms  would  be  similar  to  that  of  other  transform  domain  LMS 
algorithms.  In  particular,  our  study  (c.f.  [1])  shows  that  the  proposed  DWT  LMS  algorithms  which 
exploit  the  full  structure  of  the  wavelet  domain  correlation  matrices  outperform  these  algorithms 
as  well  as  other  transform  domain  algorithms  that  rely  on  near-diagonalizing  the  autocorrelation 
matrix.  Furthermore,  they  are  more  robust  than  the  other  transform  domain  algorithms:  their 
convergence  behavior  is  not  sensitive  to  changes  in  the  values  of  the  parameters  of  the  correlation 
matrix  of  the  input  process. 

2. A  Wavelet  transform  domain  adaptive  filtering 

Let  us  now  briefly  describe  the  two  new  fast  wavelet  domain  techniques  for  whitening  the  input 
data.  Refer  to  [1]  for  a  more  complete  discussion  of  these  procedures  and  an  analysis  of  the  effect 
of  the  approximations  that  we  make  on  the  performance  of  the  resulting  wavelet  domain  LMS 
algorithms.  These  two  techniques  rely  on  the  special  sparse  structure  of  the  wavelet  transform  of 
wide  classes  of  correlation  matrices  and  that  of  the  Cholesky  factors  of  these  matrices  [17]. 

Specifically,  suppose  that  we  compute  an  M-band  wavelet  decomposition  of  an  input  stochastic 

process  x(n).  Denote  by  R{f(l,  m)  the  cross-correlation  between  the  Ith  component  of  the  stochastic 

process  in  the  jth  band  and  the  mth  component  of  the  process  in  the  kth  band.  It  is  shown  in  [17]  that 

under  very  mild  conditions  on  the  covariance  of  R(l,m)  of  x(n),  \Rif(l,m)\  decays  asymptotically 

to  zero  at  a  rate  much  faster  than  \R(l,  m) |.  In  particular,  denote  by  Ry  the  autocorrelation  matrix 

of  the  wavelet  transform  y(n)  of  the  vector  x(n)  =  [x(n),x(n  -  1), •••,*(»  —  N  +  1)]T.  Let  Ry 

be  the  approximation  to  Ry  that  is  obtained  by  setting  all  entries  of  Ry  of  magnitude  less  than  a 

small  constant  Sr  to  zero.  Furthermore,  let  R^,fc  denote  the  sub-block  of  Ry  that  represents  the 

cross  covariance  between  the  outputs  of  filters  Wj(n )  and  Wk(n).  The  results  of  [17]  indicate  that 

each  sub-block  Rj  k  of  Ru  is  banded.  Furthermore,  each  row  or  column  of  Ry  will  have  0 (log  TV) 
y  y 

non- zero  elements. 

It  is  also  shown  in  [17]  that  the  Cholesky  factor  of  Ry  has  the  same  sparse  banded  structure  as 
that  of  the  original  matrix  (for  this  proof  and  the  choice  of  Sr,  refer  to  [17]).  The  banded  structure 
and  the  bandwidths  of  Ry(fc)  depend  on  the  wavelet  transform  filters  used  and  the  input  process 
characterization. 

The  class  of  processes  for  which  the  above  results  hold  true  is  quite  large.  For  example  it 
includes  all  stationary  processes  that  can  be  obtained  by  passing  white  noise  through  a  rational 
filter.  The  covariance  functions  of  such  processes  decay  exponentially  fast  to  zero  for  fixed  n  as  m 
tends  to  infinity.  It  also  includes  nonstationary  process  such  as  a  sampled  version  of  a  fractional 
Brownian  motion  [22],  [23].  The  fractionally  differenced  white  noise  process  [24],  [25]  and  samples 
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of  the  underwater  background  acoustic  noise  in  shallow  water  taken  along  any  straight  line  [26]  are 
other  examples  of  nonstationary  processes  in  that  class. 

2.  A.l  Fast  Wavelet  domain  LMS  algorithm  via  Cholesky  factorization  of  an  estimated 
covariance  matrix 

Denote  by  y(rc)  =  Qx(n)  the  N  X  N  discrete  wavelet  transform  of  x(n).  We  update  the 
transformed  weight  vectors  w(n)  =  Qh(n),  in  the  proposed  algorithm  using  the  equation 

w(»  +  1)  =  w(n)  -  jig (n).  (2.6) 

Here, 

R»(»)g(n)  =  V„(n),  (2.7) 

where  Ry(n)  is  a  sparse  estimate  of  Ry  at  step  n.  In  particular,  we  compute  Ry  by  estimating  only 
the  expected  non-negligible  entries  of  Ry.  All  other  entries  of  Ry(n)  are  set  to  zero  (see  below). 
Vy(n),  the  instantaneous  gradient,  is  given  by 

Vy(n)  =  -2e(n)y(n),  (2-8) 

where  e(n)  is  output  the  error  given  as  e(n)  =  d(n)  -  w(n)Ty(n). 

We  now  use  the  sparse  Cholesky  factorization  of  Ry(n)  to  solve  (2.7)  by  rewriting  it  as 

Mn)Dy(»)Ly  (n)g(n)  =  Vy(n).  (2.9) 

We  can  solve  the  above  equation  in  three  steps  as 

Ly(n)a  =  V„(n),  (2.10) 

Dy(n)a  =  a  and  (2.11) 

i<s(n)s(n)  =  *•  (2-12) 

Since  Dy(n)  is  a  diagonal  matrix  and  Ly  has  at  most  log  N  non-zero  element  per  row  [17],  evaluating 
g(n)  using  this  approach  requires  0(N  log  N)  operations  only. 

In  order  to  use  the  above  algorithm,  we  need  to: 

1.  compute  the  wavelet  transform  y(n)  of  x(n), 

2.  maintain  and  update  the  sparse  matrix  approximate  Ry(w)  of  the  wavelet  domain  correlation 
matrix, 

3.  compute  the  approximate  Cholesky  factor  L y(ra)  of  Ry(n), 

4.  solve  equations  (2.10)-(2.11). 

Computing  the  wavelet  transform  y(n)  of  x(n)  is  inexpensive:  it  requires  only  0(N )  operations. 
To  compute  Ry  we  proceed  as  follows.  Note  that  the  correlation  matrix  Ry  consists  of  sub¬ 
blocks  Rl-fc,  where  Rl,fe  denotes  the  cross  correlation  matrix  of  the  output  of  the  jth  band  and  that 
of  the  kth  band.  From  the  discussion  of  [17],  the  elements  of  R*,fc  away  from  its  main  diagonal 
decay  to  zero  faster  than  Rx(l,Tn).  The  sparse  approximate  R^,fc  of  R^,fc  obtained  by  setting  all 
elements  of  R£’fc  below  a  threshold  Sr  to  zero,  is  therefore  a  band  diagonal  matrix.  We  form  Ry 
by  replacing  Rpk  in  Ry  by  their  corresponding  sparse  band  diagonal  approximates  R^’  . 
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Given  the  threshold  6r,  the  decimation  factor  M,  the  type  of  the  characterization  of  the  input 
process  (partial  knowledge),  we  can  determine  the  rate  of  decay  of  the  elements  of  Rpk,  away  from 
its  main  diagonal  (c.f.  [17]).  This  in  turn  determines  the  approximate  bandwidths  of  the  matrices 
Ri.fc.  With  this  in  mind  we  define  a  sparsity  map  S  such  that  its  (/,  m)th  element  5(/,m)  is  given 


by 


S(l,m ) 


l  —  m  or  Ry(l ,  m)  is  predicted  to  be  significant 
otherwise. 


(2.13) 


Specifically,  we  choose  S(/,  m)  =  1  whenever  entry  (/,  m)  is  “near”  the  main  diagonal  of  one  of  the 
matrices  R£fc.  For  the  type  of  input  process  in  our  experiments,  we  have  chosen  S(l,m )  =  1  for 
each  l  whenever  entry  (/,  m)  was  three  or  four  entries  or  less  away  from  the  main  diagonal  of  a 
matrix  R|-fc  and  some  other  elements,  which  give  these  bands  a  jagged  appearance.  The  width  of 
these  bands  and  the  exact  jagged  nature  depend  on  the  type  of  input  process. 

We  can  now  develop  a  method  for  updating  Ry(n  +  1)  at  time  n  +  1  in  terms  of  its  estimate  at 
time  n,  by  updating  only  those  entries  for  which  is  non  zero.  The  update  equation  for  each 

element  of  Ry(n  +  1)  is  given  as 


Ry(l,m  :  n  +  1) 


(l-7„)^j,(/,m:n)  +  7„y/(n)ym(n),  S(l,m)  =  1 
0  otherwise 


(2.14) 


where  Ry(l,m  :  n )  denotes  the  element  of  Ry(»)  and  is  a  weighting  factor.  Parameter 

7„  is  usually  chosen  to  be  a  constant  <  1  independent  of  n  or  as  It  allows  us  to  “forget”  past 
input  values. 

Note  that  we  update  only  those  elements  of  Ry(n),  which  are  significant.  Since  Ry  has  only 
0(N  log  N)  non-neglijpble  elements,  this  update  also  requires  0(N  log  N)  operations.  Our  up-date 
scheme  is  not  equivalent  to  updating  R y(n)  by  an  outer  product.  Hence,  the  update  of  Ry(«)  is 
not  a  simple  rank  one  update. 

Solving  (2.10)-(2.11)  requires  0(N log  N)  operations  since  L y(n)  has  at  most  O  (log  IV)  non-zero 
element  per  row  or  column. 

The  lower  triangular  matrix  Ly(n)  and  the  diagonal  matrix  Dy(n)  may  be  computed  using  the 
iteration  [27]: 

For  k  =  1,2 

k—1 

Dy(k ,  k:n)  =  Ry(k,  k:n )  -  X!  ^Kk^P  :  n)Dy(p,P  ■  n) 

P  = 1 

For  i  =  k  +  1,*  •  •  ,N 

k-l  ^ 

Ly(i,k  :  n )  =  (Ry(i,k  :  n )  :  n)Ly(k,p  :  n))/Dy(k,k  :  n), 

P=i 

where  Ry(i,j  :  n )  denotes  the  (i,i)th  element  of  R y(n).  The  computation  in  each  sub-block 
proceeds  until  an  element  of  magnitude  e  or  less  is  computed,  where  e  is  a  pre-selected  threshold 
(see  [17]  for  guidance  on  how  to  select  e  ).  At  this  point  all  remaining  elements  in  the  column  being 
computed  within  that  sub-block  are  set  to  zero.  Since  both  Ry(n)  and  Ly(n)  have  at  most  (7 (log  Ar) 
non-zero  element  per  row  or  column,  this  iteration  requires  0(N  log2  IV)  operations.  Therefore,  the 
main  computational  bottleneck  of  our  algorithm  is  the  computation  of  Ly(n)  from  Ry(n). 

In  the  initial  stages  of  the  algorithm,  the  matrix  Ry(n)  may  not  be  positive  definite,  this  would 
then  imply  that  the  direction  calculated  using  equations  2.10-2.12  might  not  be  the  descent  di¬ 
rection.  This  problem  has  been  studied  in  conjunction  with  the  Newton  method  for  optimization, 


(2.15) 

(2.16) 
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where  the  Hessian  might  turn  out  to  be  indefinite  or  singular.  Several  modified  Newton  methods 
have  been  proposed  in  literature,  which  modify  the  Hessian  whenever  the  Hessian  is  not  posi¬ 
tive  definite  and  reverting  to  the  Newton  directions  as  the  minimum  is  approached,  thus  taking 
advantage  of  the  faster  convergence  of  Newton’s  method. 

An  early  approach  to  this  problem  were  the  directional  discrimination  methods  (cf.  [28]). 
However  these  methods  involve  an  eigenvalue  decomposition.  Another  way  of  modifying  the  Hessian 
approximate,  Rj,  is  to  add  another  positive  definite  matrix  to  it.  The  simplest  positive  definite 
matrix  that  could  be  added  to  Ry(ra)  is  a  small  diagonal  matrix,  k„I  [29],  [30].  «n  is  chosen  to 
ensure  that  R2 ,(n)  +  k„I  is  positive  definite.  Depending  on  the  size  of  Kn,  the  method  behaves  more 
like  the  simple  LMS  ( -*■  oo)  or  LMS-Newton  type  method  («:„  -*•  0).  It  is  thus  desirable  that  Kn 
become  zero  as  the  minimum  is  approached  to  take  advantage  of  the  fast  convergence  properties 
of  the  LMS-Newton  method.  In  practice,  Kn  is  chosen  as  ^  where  c  is  a  constant  for  the  first  100 
to  200  iterations  till  the  estimate  for  Ry(n)  becomes  sufficiently  positive  definite  after  which  we 
can  discontinue  diagonal  loading.  This  assures  that  the  addition  of  Kn I  does  not  contribute  to  the 
excess  MSE,  and  the  speed  of  convergence  is  not  affected  much. 

Another  method  of  modifying  the  Hessian  makes  use  of  the  of  the  fact  that  during  the  Cholesky 
(Ly(n)Dy(n)Ly  (n))  decomposition  of  Ry(n)  one  or  more  diagonal  elements  of  Dy(n)  become  non¬ 
positive  whenever  the  Ry(n)  becomes  nonpositive  definite.  A  simple  modification  to  the  Cholesky 
decomposition  procedure  then  produces  a  Cholesky  factorization  of  Ry(n)  +  d>(n ),  where  d>(ra)  is 
diagonal  with  positive  diagonal  elements.  This  factorization  is  computed  in  a  single  pass  and  avoids 
the  need  for  several  trial  d>(n)  matrices  (the  interested  reader  is  referred  to  [31]). 

Another  simple  engineering  modification  suggests  that  whenever  Ry(n)  becomes  non-positive 
definite,  the  Cholesky  factorization  is  stopped  (as  soon  as  one  encounters  a  negative  diagonal 
element  in  Dy(w))  and  for  that  iteration,  the  descent  direction  is  taken  to  be  the  same  as  that 
of  a  standard  transform  domain  LMS  algorithm.  In  practice  such  a  modification  affects  the  rate 
of  convergence  very  little  and  it  remains  nearly  the  same  as  that  of  the  unmodified  LMS-Newton 
algorithm  with  the  exact  positive  definite  Ry(ra). 


2.A.2  Fast  Wavelet  domain  LMS  algorithm  via  direct  estimation  of  the  Cholesky 
factor  of  the  input  covariance  matrix 

The  main  computational  bottleneck  of  the  algorithm  discussed  above  is  its  Cholesky  factoriza¬ 
tion.  Also,  in  the  initial  stages  of  the  algorithm,  the  matrix  Ry(n)  may  not  be  positive  definite.  We 
therefore  need  a  procedure  that  computes  sparse  triangular  factors  of  R y{n)  at  a  lower  computa¬ 
tional  cost  and  also  works  in  situations  where  the  matrix  R y(n)  may  not  be  positive  definite.  Such 
an  algorithm  may  constructed  using  a  QR  decomposition  of  the  transform  domain  data  matrix. 

Consider  the  data  matrix  Y (»)  obtained  by  stacking  up  the  weighted  input  vectors  of  the 
adaptive  filter  from  time  0  to  n  as  rows 


Y(n)  = 


/  /7(1  -  7)nyr(0)  \ 
1/7(1  -7)(n_1)yT(1) 


V  y/iyT(n)  ) 


v^yT(n)  /  ’ 


(2.17) 


where  7  <  1  is  the  forgetting  factor.  Let  us  compute  the  upper  triangular  factor  U(n)  of  Y(n) 
by  performing  a  QR  decomposition  of  Y(n).  That  factor  converges  (up  to  a  sign  difference  in  the 
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rows)  to  the  Cholesky  factor  of  the  correlation  matrix  R^n)  defined  as 

R  y(n)  =  7(1  -  7)n~ky(k)yT(k )  =  (1  -  7)rv(w  - !)  +  7y(n)yr(n)-  (2-18) 

k= 0 

Thus,  it  is  reasonable  to  estimate  Ry(n)  as  Uj(n)Uy(n),  where  Uy(n)  is  a  structured  “pseudo” 
sparse  upper  triangular  factor  of  the  data  matrix.  In  particular,  we  can  update  the  structured 
upper  triangular  factor  at  every  iteration  rather  than  recompute  it  as  in  the  previous  algorithm. 
The  structured  upper  triangular  factor  that  we  would  obtain  will  be  close  to  the  upper  triangular 
factor  obtained  by  an  exact  QR  decomposition  of  Y (ra)  but  is  not  equal  to  this  exact  factor. 

The  update  equations  for  the  transformed  weight  vectors  can  now  be  written  as 

w(n  +  1)  =  w(n)  -  /Jg (n),  (2.19) 


where 

Uj  (n)Uy(n)g(n)  =  Vy(n).  (2.20) 

The  instantaneous  gradient  Vy(n)  is  obtained  as  in  the  previous  algorithm.  The  above  equation 
can  now  be  solved  in  two  steps  as 


Uj(n)a  =  Vj,(ra),  (2.21) 

Uy(n)g(n)  =  a.  (2.22) 


Since,  Uy(n)  is  sparse  having  at  most  0 (log  TV)  non-zero  elements  per  row,  evaluating  g (n)  using 
this  approach  requires  O(lVlogN)  operations. 

We  can  compute  and  maintain  the  sparse  “pseudo”  upper  triangular  factor  Uy(n)  in  0(TVlog  TV) 
flops.  Specifically,  recall  that  the  upper  triangular  factor  Uy(n)  obtained  by  performing  a  QR 
decomposition  of  the  data  matrix  Y (n)  converges  (up  to  a  sign  difference  in  the  rows)  to  the 
Cholesky  factor  Lj.  Therefore,  the  sparsity  structure  of  both  the  matrices  is  the  same.  With  this 
observation  in  mind,  we  define  a  new  sparsity  map,  S,  similar  to  the  one  described  in  the  previous 
section.  The  ( l,m)th  element  of  this  sparsity  map,  S(/,m),  is  obtained  as 


S(l,m) 


{1  /  =  m  or  Ly(m,l )  is  predicted  to  be  significant 

0  otherwise. 


(2.23) 


The  matrix  Uj,(n)  can  now  be  easily  updated.  This  is  done  by  appending  the  input  vector  y(n) 
as  a  new  row  to  Uy(n  -  1)  and  zeroing  this  row  by  appropriate  Givens  rotations  [27].  The  kth 
Givens  rotation  is  a  multiplication  by  an  orthogonal  matrix  of  the  form 


*fc  = 


I 

0 

0 

0 


VO 


0 

Ck 

0 

~Sk 

0 


0  0 

0  sk 

I  0 

0  Ck 

0  0 


0\ 

0 

0 

0 


I ) 


where  c\  +  s\  =  1  and  the  I’s  represent  identity  matrices  of  appropriate  dimensions.  This  operation 
can  be  expressed  as  „  _ 

(T)  =  Z(”>  i'/U\fiyT(n)  ~ 1})  ’  (2'24) 

Z(n )  is  the  unitary  matrix  denoting  the  effect  of  the  sequence  of  Givens  rotations. 
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The  zeroing  procedure  takes  into  account  the  sparse  structure  of  Uj,(n)  and  proceeds  as  follows: 
we  annihilate  the  first  element  of  the  added  row  by  applying  a  Givens  rotation  to  operate  on  the 
first  and  last  row  of  the  modified  matrix.  During  this  update,  only  the  elements  of  the  first  row  that 
are  predicted  to  be  non-negligible  and  their  corresponding  entries  in  the  last  row  are  updated.  All 
others  in  the  first  row  are  left  as  zeros.  The  elements  of  the  last  row  which  have  not  yet  been  updated 
have  to  be  multiplied  by  the  gain  ci.  However,  this  multiplication  is  not  performed  right  away.  A 
table  is  constructed  to  keep  track  of  the  elements  which  require  this  multiplication.  The  gain  cx  is 
also  stored.  Next,  the  second  element  in  the  last  row  of  the  transformed  matrix  is  annihilated  by 
applying  the  Givens  rotation  to  operate  on  the  second  and  last  rows.  The  second  element  of  the 
last  row  is  first  multiplied  by  cx  in  order  to  calculate  c2  and  s2.  The  previous  procedure  is  repeated 
by  updating  only  the  non-zero  elements  in  the  second  row  and  their  corresponding  elements  in  the 
last  row  after  multiplying  these  by  the  gain  c2ci.  The  element  locations  which  are  not  updated 
at  this  stage  and  the  gains  they  have  to  be  multiplied  with  are  entered  into  the  table.  Thus, 
at  each  rotation,  the  locations  of  the  elements  in  the  last  row  which  are  not  updated  and  their 
corresponding  gains  are  stored  in  the  table.  The  gains  are  a  sequence  of  multiplications  by  c^’s 
and  can  be  either  computed  or  updated  from  previously  calculated  gains.  The  procedure  which 
requires  fewer  number  of  operations  is  chosen.  As  each  row  has  at  most  O(log  N)  elements,  only 
O(log  N )  operations  are  performed  to  update  these  elements.  As  the  gains  are  computed  using  the 
fewest  number  of  flops  and  due  to  the  special  sparse  structure,  the  total  number  of  computations 
required  to  compute  all  the  required  gains  is  0(N  log  N).  The  entire  procedure  therefore  requires 
only  O(NlogN)  computations. 

The  update  process  can  be  best  explained  by  looking  at  an  example.  Consider  an  8  x  8  upper 
triangular  matrix  with  the  sparsity  structure  as  shown  in  Fig.  1.  Note,  that  the  matrix  considered 
in  this  example  is  a  banded  matrix  with  unity  bandwidths.  Also,  the  transform  chosen  is  a  dyadic 
WT.  This  sparsity  structure  corresponds  with  that  of  the  thresholded  covariance  matrix  in  the  WT 
domain,  Ry,  with  diagonal  sub-blocks  The  x's  in  the  figure  indicate  non-zero  elements  of  the 
matrix  and  the  r’s  indicate  the  new  row.  In  the  first  Givens  rotation,  only  the  non-zero  elements 
in  the  first  row  are  updated.  The  first  r  of  the  new  row  is  replaced  by  a  zero,  and  the  elements 
of  this  row  which  are  affected  by  the  non-zero  elements  of  the  first  row  of  the  matrix  are  updated. 
The  remaining  elements  of  the  new  row  have  to  be  multiplied  by  ci  in  order  that  the  sequence 
of  rotations  remain  orthogonal.  The  elements  which  are  not  updated  are  then  entered  into  the 
table  along  with  the  gain  they  need  to  be  multiplied  with.  Table  1  shows  the  book  keeping  for 
the  example  at  each  stage  of  the  Givens  rotation.  From  the  table,  we  see  how  the  gains  can  be 
obtained  from  previously  calculated  gains.  For  example,  the  gain  C3C2Ci  in  the  table  is  calculated 
by  multiplying  the  previously  calculated  gain  c2cj  by  C3  requiring  only  one  multiplication.  Similarly 
products  like  C4C3C2  can  also  be  calculated  by  multiplying  C3C2C1  with  C4  and  dividing  the  result  by 
Ci,  though  in  this  example  we  could  form  the  product  by  direct  multiplications  as  they  require  the 
same  number  of  flops. 

For  a  matrix  having  the  sparse  structure  similar  to  the  one  shown  in  Fig.  1  and  which  uses  a 
dyadic  wavelet  transform,  a  rough  estimate  of  the  number  gain  computations  can  be  obtained  as 


Number  of  floating  point  operations 


-(*-  1) 


(2.25) 


where  k  =  log2  N.  This  shows  that  the  number  of  gain  computations  is  0(N log  N ).  Note  also  that 
in  those  cases  were  the  bandwidths  of  the  sub-blocks  Rpk  are  wider  than  one,  the  number  of  com¬ 
putations  required  to  perform  the  Givens  update  increases  while  the  number  of  gain  computations 
needed  decreases.  However,  the  Givens  update  still  takes  only  0(N  log  A)  flops.  Thus  the  entire 
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Rotation  # 

Updated  elements 

Gains 

Elements 

Extra  Computations 

i 

1,5, 7, 8 

Cl 

2, 3, 4, 6 

0 

2 

2 

C2Ci 

3,4,6 

0 

C2 

5,7,8 

3 

3,6 

C3(C2Cl) 

4 

1 

C3C2 

5,7,8 

4 

4 

C4 

6 

1 

C4C3C2 

5,7,8 

5 

5,7,8 

C5C4 

6 

2 

6 

6 

C6 

7,8 

1 

7 

7,8 

0 

8 

8 

0 

Table  1:  Book  keeping  table  indicating  the  number  of  extra  computations  for  the  8  X  8  matrix 
example. 


update  procedure  takes  only  OfiVlog  N)  flops. 


2.B  Discussion 


The  first  algorithm  that  we  presented  uses  a  structured  estimate  the  WT  domain  covari¬ 
ance  matrix.  It  uses  this  estimate  to  compute  an  exact  Cholesky  decomposition.  Thus  the  error 
L„(n)tJ(n)-R»  tends  to  a  small  value  as  n  — ►  oo.  therefore,  the  error  E (n)  -  Ly(n)L^  (n)  — Ry(n) 
tends  to  E  =  Ky  -  Ky  for  large  n.  This  implies  that  for  large  n,  the  triangular  factors  obtained  by 
this  algorithm  converge  to  the  exact  Cholesky  factors  of  the  sparse  WT  domain  covariance  matrix. 
For  large  n,  the  condition  number  of  the  matrix  (L2/(n)Ly  (n))“1Ry(n)  can  be  written  as 


XULyWtfWr'Kvin))  »  x((LvLJrlR3/)  =  x(R;1Ry)  =  x((Ry  +  E)  %) 

_  Amax  (I  +  Ry  XE)  ^  1  +  Amax(Ry  ^^maxCE) 
Amin (!  +  RylE)  "  1  "  AmaxCRJT'^maxCE)* 

Here  we  have  used  the  inequality  [32], 

<rJ+fc+i(AB)  <  crj+i(A)afc+i(B), 


(2.26) 


(2.27) 


xOOOxOxx 
x  0  0  0  0  0  0 

x  0  0x00 

x  0  0  0  0 

x  0  x  x 

x  0  0 

X  X 
X 

rrrrrrrr 


After  1  *st 


rotation 


xOOOxOxx 
x  0  0  0  0  0  0 

x  0  0  x  0  0 

x  0  0  0  0 

x  0  xx 
x  0  0 

X  X 
X 

0  r  r  r  ©  r  ©  © 


Figure  1:  Zeroing  of  the  first  element  in  the  sparse  QR  updation  procedure. 
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where  <7X(A),  •  •  •  <rjv( A)  denote  the  singular  values  of  A  in  non  ascending  order,  we  get 

\nin(*  +  Ry  *E)  >  1  -  Amax(Ry  1)o'max(E).  (2.28) 

However,  the  elements  of  E  axe  negligible.  Thus,  Amax(Ry  1)^max(E)  <C  1-  Thus  the  condition 
number  is  close  to  unity. 

The  second  algorithm  uses  the  QR  decomposition  to  force  Uy  (n)Uy(n)  to  be  close  to  the 
thresholded  WT  domain  covariance  matrix  Rj,(n).  In  essence,  this  approach  computes  the  sparse 
upper  triangular  factors  of  a  positive  definite  matrix  close  (in  some  norm)  to  the  sparse  covariance 
matrix  Ry(ra).  Note  that  even  though  the  error  E(ra)  =  Ry  —  Uy  (rc)Uj,(n)  is  not  amenable  to  a 
simple  direct  analysis,  our  simulation  results  indicate  that  condition  number  of  the  transformed 
and  preconditioned  input  covariance  matrix,  x(U~r(n)Ry(n)U“1(n))  converges  rapidly  to  close  to 
one.  Thus,  the  sparse  QR  update  procedure  in  conjunction  with  the  wavelet  domain  LMS  algorithm 
yields  a  Newton-LMS  type  algorithm  with  fast  convergence  rate  and  low  (0(JVlog  N ))  complexity. 

In  summary,  both  procedures  are  effective  in  whitening  the  input  data.  The  second  procedure 
has  a  lower  computational  complexity  at  the  expense  of  a  complicated  bookkeeping  procedure. 

The  superior  performance  of  the  two  algorithms  is  clearly  illustrated  by  Figs.  2  and  3.  Fig.  2 
shows  the  learning  curves  of  the  proposed  algorithm,  the  DCT  domain  and  the  DWT  domain  mod¬ 
ified  LMS  algorithms.  Both  the  DCT  and  the  DWT  domain  modified  LMS  algorithms  use  diagonal 
preconditioning.  The  learning  curves  are  for  a  system  identification  example.  The  unknown  system 
was  modeled  as  a  32  tap  FIR  filter.  The  adaptive  filters  were  chosen  to  be  of  the  same  length 
as  the  unknown  modeling  FIR  filter.  The  input  process  to  the  adaptive  filter  was  a  single  pole 
AR  process.  The  pole  location  was  kept  at  -0.98.  The  signal  to  noise  ratio  at  the  output  of  the 
unknown  system  was  chosen  to  be  80dB.  The  learning  curves  were  obtained  as  an  average  of  25 
statistically  independent  experiments.  The  figure  shows  the  superior  performance  of  the  proposed 
algorithms.  As  expected,  the  DCT  domain  modified  LMS  algorithm  does  not  perform  well  when 
the  input  is  a  high  pass  process.  The  figure  also  confirms  the  fact  that  even  the  wavelet  transform 
diagonalizes  the  correlation  matrices  of  only  certain  classes  of  processes  in  the  transform  domain. 
Hence,  with  simple  diagonal  preconditioning  we  might  not  be  able  to  obtain  the  optimum  conver¬ 
gence  rates.  Note  that  /z  for  all  algorithms  was  kept  at  0.005.  A  larger  /z  made  the  DCT  and  DWT 
domain  modified  LMS  algorithms  unstable.  However,  a  larger  /z  improved  the  convergence  speed 
of  the  proposed  algorithm.  This  is  due  to  the  fact  that  the  condition  number  of  the  correlation 
matrix  is  large  even  with  simple  diagonal  preconditioning  in  both  the  transform  domains.  As  /j, 
depends  inversely  on  the  largest  eigenvalue  of  the  correlation  matrix,  a  large  /z  makes  the  DCT  and 
the  DWT  domain  modified  LMS  algorithms  unstable.  However,  the  condition  number  of  the  input 
correlation  matrix  is  made  close  to  unity  by  the  proposed  algorithm  and  so  it  does  not  suffer  from 
these  effects. 

In  order  to  show  the  robustness  of  the  proposed  algorithms  to  the  type  of  input  process,  a  system 
identification  example  was  simulated.  The  input  was  again  a  single  pole  AR  process  with  the  pole 
located  at  0.9.  All  other  parameters  in  this  experiment  were  identical  to  those  of  the  previous 
experiment.  Fig.  3  shows  the  learning  curves  of  the  proposed  algorithm  and  the  DCT  domain 
modified  LMS  algorithm.  The  figure  shows  that  the  performance  of  the  DCT  domain  modified 
LMS  algorithm  is  now  comparable  to  that  of  the  proposed  algorithm.  Comparing  Fig.  2  with  Fig.  3 
shows  that  the  proposed  algorithms  are  not  sensitive  to  the  type  of  input  process  and  achieve 
optimal  convergence  speeds  for  all  inputs.  Simulations  also  show  that  the  final  misadjustment  error 
for  both  the  algorithms  are  nearly  equal. 
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Input  AR  process  pole  at  -0.98,  SNR  80dB 


Figure  2:  Plot  showing  the  learning  curves  of  the  proposed  algorithm  and  the  DCT-LMS  and 
DWT-LMS  algorithms  for  a  system  identification  example.  The  input  signal  is  a  single  pole  AR 
process  with  pole  at  -0.98. 

3  Adaptive  radar  range-Doppler  imaging 

In  [2]  and  [3],  we  describe  a  novel  high  resolution  radar  range-Doppler  imaging  that  relies  on  an 
adaptive  waveform  selection  strategy.  In  this  approach,  the  transmitted  radar  waveforms  are  cho¬ 
sen  sequentially  and  adaptively  to  match  as  closely  as  possible  the  optimal  probing  waveforms  that 
correspond  to  the  range-Doppler  image  that  has  to  be  acquired.  The  new  radar  imaging  procedure 
yields  higher  resolution  images  for  a  given  number  of  transmitted  radar  waveforms  or  a  given  imag¬ 
ing  time  than  traditional  imaging  techniques  that  rely  on  a  fixed  set  of  waveforms.  Alternatively, 
it  produces  an  image  with  the  desired  range-Doppler  resolution  using  minimal  imaging  time  and  a 
minimal  number  of  probing  waveforms.  This  new  approach  can  be  used  in  the  new  generation  of 
flexible  radars  that  are  currently  under  testing  by  several  manufacturers.  We  are  also  investigating 
an  extension  of  these  techniques  for  high  resolution  2-D  Doppler  -range  ultrasound  imaging.  By 
2-D  Doppler,  we  mean  an  estimate  of  the  line  of  sight  and  transversal  velocities  of  the  target.  See 
[33]  for  details. 

Traditionally  radar  range-Doppler  images  (e.g.,  SAR  and  ISAR  images)  have  been  obtained  by 
transmitting  a  single  waveform  or  a  train  of  waveforms  obtained  from  a  single  pulse,  repeatedly. 
The  returns  corresponding  to  the  transmitted  waveforms  are  then  collected,  demodulated,  and 
filtered  using  a  matched  filter.  The  outputs  of  the  matched  filter  are  finally  aligned  and  a  Fast 
Fouier  transform  is  computed  along  each  range  bin  to  generate  the  range-Doppler  images  [34].  The 
range-Doppler  images  thus  obtained  suffer  from  poor  resolution  and  range  migration  effects.  Range 
resolution  increases  with  a  decrease  in  the  width  of  the  transmitted  pulse,  while  Doppler  resolution 
increases  with  the  number  of  pulses  used  to  obtain  the  range-Doppler  image. 

In  contrast  with  traditional  range-Doppler  imaging  techniques,  the  procedure  that  we  devel¬ 
oped  selects  the  transmitted  radar  waveforms  dynamically  and  adaptively.  The  procedure  solves 
a  simultaneous  image  classification  and  reconstruction  problem.  In  this  approach,  progressively 
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finer  images  are  formed  by  processing  the  returns  corresponding  to  the  transmitted  signals.  The 
transmitted  waveforms  are  selected  sequentially  and  adaptively  as  more  information  is  gathered. 
Specifically,  each  transmitted  waveform  is  selected  to  maximize  the  amount  of  improvement  in  the 
quality  (measured  using  the  2  norm)  of  the  current  approximation  to  the  desired  range-Doppler  im¬ 
age  and  provide  guidance  for  the  selection  of  the  following  waveform.  This  strategy  is  based  on  the 
fact  that  the  optimal  waveforms  that  should  be  used  to  image  a  range-Doppler  density  function  are 
singular  functions  of  two  kernels  derived  from  the  range-  Doppler  density  function  [2].  An  added 
advantage  of  the  proposed  approach  is  that  it  does  not  suffer  from  range  migration  effects  because 
it  is  based  on  a  broadband  formulation  of  the  radar  range-Doppler  imaging  problem. 

Consider  first  a  point  reflector  at  a  distance  r(t0 )  from  a  monostatic  radar  at  time  t0.  Let  s(t) 
be  the  waveform  transmitted  by  the  radar.  We  will  assume  here  that  the  echo  received  by  the  radar 
at  time  t  is  given  by 

As(t  -  r(t))  (3.29) 

where  A  is  a  constant  determined  by  the  reflection  properties  of  the  point  target  and  the  propagation 
characteristics  of  the  medium  and  r(t )  is  the  total  delay  incurred  by  the  part  of  the  waveform  that 
arrives  at  the  radar  at  time  t.  Note  that 

r(!)  =  \r ((  -  \r (())  (3.30) 

where  c  is  the  velocity  of  propagation  of  the  electromagnetic  waves  in  the  medium. 

It  is  convenient  to  express  r(t)  as  a  power  series  over  the  signal  duration.  In  particular,  if  the 
change  in  the  velocity  of  the  reflector  over  the  illumination  time  (duration  of  s{t))  is  negligible 
compared  to  c,  then  we  find  that  [8] 


r(t)  «  x  + 


2v(x/2) 
c +  v(x/2) 


(t-x) 


(3.31) 


where  x  is  an  arbitrary  reference  time  instant  and  v(ar/2)  is  the  velocity  of  the  point  reflector  at 
time  x/2  along  the  line  of  sight.  Note  that  it  follows  from  (3.30)  that  the  range  of  the  target  at 
time  t  —  x/2  is  cx/2.  Hence,  the  received  waveform  at  time  t  will  have  the  form  Af{y(t  —  x) 
where  y  =  (c  -  v(x/2 ))/(c  +  v{x/2)).  This  is  the  exact  broadband  form  of  the  echo  rather  than 
the  usual  narrow  band  approximation  used  in  the  literature.  This  formulation  of  the  echo  takes 
automatically  care  of  range- migration  effects. 

Suppose  now  that  the  target  consists  of  a  continuum  of  point  reflectors  located  at  various 
ranges,  moving  with  different  velocities  with  respect  to  the  radar  and  having  different  reflectivities. 
Assuming  that  reflection  and  propagation  are  linear  and  that  all  parts  of  the  target  are  illuminated 
equally,  the  echo  e(t)  due  to  the  target  will  have  the  form 


/•oo  roo 

e(t)=  dy  dxD(x,y)s(y(t-x)). 
JO  J— oo 


(3.32) 


In  the  above  equation  D(x,y )  is  the  reflectivity  of  a  point  target  located  at  range  cx/2  and  moving 
with  velocity  (1  -  y)c/(  1  +  y)  at  time  x/2.  Since  a  negative  range  is  meaningless,  D(x,y)  =  0  for 
all  x  <  0.  Our  goal  in  the  next  section  will  be  to  reconstruct  the  best  approximation  to  D(x,y )  by 
recording  the  echoes  due  to  a  fixed  number  N  of  transmitted  waveforms. 

We  begin  by  observing  that  (3.32)  has  the  same  form  as  an  inverse  wavelet  transform.  Specif¬ 
ically,  if  ip(t)  is  a  valid  wavelet  function  then  any  square  integrable  signal  f(t)  can  be  written 

1  roo  roo 

f(t)  =  —[  dy  dx  F(x,y)y/yip(y(t-  x))  (3.33) 

JO  J— oo 
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where  C \p  is  a  constant  that  depends  on  ip(t)  and  F{x,y )  is  the  continuous  wavelet  transform  of 
f(t )  with  respect  to  This  observation  seems  to  suggest  then  that  one  can  reconstruct  D(x,  y) 
by  transmitting  a  single  valid  wavelet  function  ip(t)  and  computing  the  wavelet  transform  of  the 
echo.  Unfortunately,  as  observed  by  Naparst  [35]  and  Maass  [36],  following  such  an  approach  simply 
yields  the  projection  of  D(x,y )  onto  the  range  of  the  wavelet  transform  with  respect  to  ^(t). 

Clearly  the  problem  that  we  are  addressing  involves  reconstructing  a  2-D  function  from  1-D 
observations.  Thus,  in  general,  reconstructing  an  arbitrary  D(x,y )  will  require  transmitting  and 
recording  an  infinite  number  of  waveforms  and  echoes.  Suppose  on  the  other  hand  that  we  are 
restricted  to  using  N  or  less  waveforms.  How  can  we  optimize  the  choice  of  these  waveforms  to  get 
the  most  accurate  approximation  of  D(x,y )  in  the  norm  ||I?(ar,y)||2  =  dy  dxjD(x,y)j2? 
In  [2]  we  showed  that  if  D(x,y )  is  actually  known,  then  the  optimal  strategy  is  to  transmit  the 
N  singular  functions  corresponding  to  the  N  largest  singular  values  of  two  kernels  derived  from 
D(x,y).  These  singular  functions  happen  to  be  valid  continuous  wavelet  functions. 

The  results  of  [2]  provide  a  general  guideline  for  choosing  N  waveforms  for  optimal  reconstruc¬ 
tion  of  a  range- Doppler  distribution  D(x,y).  Specifically,  the  waveforms  should  be  close  approx¬ 
imations  to  the  singular  functions  corresponding  to  the  N  largest  singular  values  of  two  kernels 
derived  from  D(x,y).  We  can  use  one  of  two  strategies  to  solve  this  problem:  we  can  either  select 
the  N  waveforms  to  good  approximations  to  the  singular  functions  of  the  widest  possible  class 
of  target  kernels,  or,  we  can  select  these  waveforms  adaptively  as  we  acquire  information  about 
D(x,y).  The  former  approach  is  discussed  in  [2].  Here,  we  provide  a  brief  description  of  a  simple 
instance  of  the  latter  approach.  Details  may  be  found  in  [3]. 

3. A  Discretization  of  the  problem 


We  begin  by  briefly  describing  the  discretization  of  the  problem  that  is  needed  to  implement  the 
off-line  and  on-line  stages  of  the  imaging  algorithm.  The  implementation  of  the  optimal  waveform 
selection  strategy  is  based  on  a  discretization  of  the  Fourier  transform  of  (3.32).  Specifically,  note 
that  the  Fourier  transform  of  (3.32)  yields: 


/•OO 

E(u)=  /  dyA(u,y) 
Jo 


(3.34) 


where 


/oo 

dxD(x,y)e~3“x. 

-OO 


(3.35) 


Reconstructing  A{u,u/u)  is  essentially  the  same  as  reconstructing  D(x,y)  since  they  are  related 
by  a  one-to-one  transformation.  It  can  be  shown  that  the  operator  corresponding  to  A(u>,u>/u)  is 
compact  and  therefore  admits  a  singular  value  decomposition  [2].  Hence,  A(u?,w/u)  can  be  approx¬ 
imated  using  its  N  singular  functions  corresponding  to  the  largest  singular  values. 

Assuming  that  the  support  of  the  scale  parameter  y  is  the  interval  [ymin,ymax],  (3.34)  becomes 


ty-max 

E(u)=  /  dyA(u>,y) 

Vmin 


y 


(3.36) 


Making  the  change  of  variable  u  =  w/y,  we  get 

E(u)  =  Jy'du  A(w,w/tt)^. 

ymax 


(3.37) 
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In  practice  D(x,y)  is  also  of  finite  support  along  the  variable  x.  Hence  A (u,y)  has  an  infinite 
support.  However,  we  shall  limit  ourselves  to  reconstructing  A (u>,y)  within  a  frequency  interval 
[-Umax, Umax]-  The  choice  of  umax  controls  the  ultimate  achievable  spatial  resolution.  We  further 
normalize  the  u  frequency  axis  such  that  umax  corresponds  to  tt. 

It  remains  for  us  to  determine  how  we  should  discretize  A(w,u/u).  Two  discretization  schemes 
are  possible.  In  the  first  scheme,  we  elect  to  discretize  S(u)  on  a  uniform  grid  along  the  frequency 
variable  u>.  Unfortunately,  this  leads  to  the  reconstruction  of  D(co,y)  on  a  very  sparse  grid  in 
the  u>  —  y  plane.  This  leads  to  a  very  poor  construction  of  A(u>,  y).  Alternatively,  we  may  elect  to 
discretize  A(w,  y)  on  a  uniform  rectangular  grid  in  the  lo  —  y  plane.  This  leads  to  a  structured  sparse 
matrix  and  a  non-uniform  sampling  of  S(u>)  along  the  frequency  variable  u;.  The  exact  structure 
of  the  discrete  matrix  A  that  corresponds  to  A (u,y)  depends  on  the  range  of  y  that  we  need  to 
consider.  To  find  the  value  of  the  Fourier  transform  of  the  optimal  waveform,  we  use  an  approach 
similar  to  a  singular  value  decomposition.  Specifically,  we  determine  the  N  vectors  Sn,  n  —  l,N, 
that  yield  the  matrix  An  =  En=i(^sn)Sn  that  is  closest  in  a  windowed  Frobenius  norm  to  A. 
The  window  disregards  entries  of  A  that  are  know  to  be  zero  a  priori  because  of  the  discretization 
scheme  that  we  have  chosen.  Here,  ASn  represents  a  sampled  version  of  the  Fourier  transform  of 
the  return  and  denotes  the  complex  conjugate  transpose  of  Sn. 

3.B  Waveform  selection  in  the  two  target  class  case 

The  waveform  selection  strategy  is  based  on  an  important  observation:  If  we  seek  to  minimize 
the  windowed  Frobenius  norm  of  the  error  between  A  and  An,  we  need  not  necessarily  use  the 
optimal  vectors  Sn.  It  suffices  to  use  any  set  of  vectors  un  that  form  an  orthogonal  basis  for  the 
space  spanned  by  the  vectors  Sn  (c.f.  [2]  for  the  proof  of  this  result). 

In  particular,  we  may  divide  the  problem  of  reconstructing  an  unknown  target  reflectivity 
function  into  two  stages.  In  the  first  stage  the  target  reflectivity  matrices  of  the  collection  of 
targets  is  clustered  into  distinct  classes.  Clustering  is  achieved  by  observing  the  dimension  of 
intersection  of  the  target  matrices’  right  subspaces.  This  step  is  done  offline.  The  second  stage 
is  done  online  and  can  be  divided  into  two  steps,  an  identification/  reconstruction  step  and  an 
enhancement  step.  In  the  first  step,  and  since  the  target  class  is  unknown,  vectors  outside  the  right 
subspaces  of  the  two  classes  have  to  be  sent  as  shown  in  Fig.4  for  the  two  target  classes  A  and 
B.  Since  these  vectors  are  not  optimal  in  the  sense  that  they  do  not  lie  in  the  appropriate  spaces 
corresponding  to  either  A  or  B,  the  approximation  error  increases  from  the  minimum  achievable  by 
an  optimal  set  of  vectors.  In  the  second  step,  once  the  target  is  identified,  one  proceeds  by  sending 
an  appropriate  set  of  vectors  corresponding  to  the  identified  class  to  obtain  a  higher  resolution  of 
the  target  reflectivity  matrix. 

3.B.1  Reconstruction  Error 

In  order  to  quantify  the  approximation  error  resulting  from  using  an  arbitrary  set  of  N  or¬ 
thonormal  vectors,  we  assume  that  the  target  is  A.  Let  An  be  the  approximation  of  A  using  a  set 
of  N  orthonormal  vectors  t;  for  i  =  1, ... ,  N.  Then  we  can  write  the  error  as 

||A  —  Avj|/r  =  Jmin  +  Jexcess  (3.38) 

where 

Jmin  =  J2  (3’39) 

k=N+ 1 
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is  the  minimum  error  energy  obtained  when  using  the  optimal  vectors  for  reconstruction.  It  can  be 
shown  [4]  that  the  excess  error  Jexcess  satisfies  the  bound 


(<7 %(A)  -  a%^{A))EX  <  Jexcess  <  (crJ(A)  -  02n(A))EL  (3.40) 

where  E 1  is  the  total  energy  put  in  R\.  These  bounds  are  sharp  whenever  a  significant  gap  exists 
in  the  set  of  singular  values  such  that  <rx(A)  >  02(A)  >  •••  >  0n(A)  >  a/v+i(A)  >  0n+2(A)  > 
. . .  >  <7 n(A).  A  similar  expression  also  holds  when  the  target  class  is  B.  It  follows  from  (3.40)  that 
JexZss  can  be  reduced  by  reducing  Ex.  This  can  be  achieved  by  selecting  identification  vectors 
that  have  successively  the  least  energy  possible  in  R.j[  and  Rg-  This  requirement  can  be  satisfied  by 
using  the  notion  of  principal  vectors  and  angles  between  two  subspaces  [37].  The  angles  0X, . . .  ,0jv 
between  two  subspaces  Ra  and  Rg  are  defined  recursively  for  i  —  1, . . . ,  N  by 


subject  to 


cos  6k  =  max  max  uTv  =  ujfvfc 

U£Ra  V£Rb 

(3.41) 

||u||  =  ||v||  =  1 

(3.42) 

uTut-  =  0  i  =  1, . . . ,  k 

(3.43) 

vTv;  =  0  *  = 

(3.44) 

The  principal  angles  satisfy  0  <  9X  <  •  •  •  <  0N  <  ?r/2.  The  vectors  {ui, . . . ,  uat}  and  {yi, . . . ,  vjv} 
are  called  principal  vectors  between  Ra  and  II b  and  both  form  an  orthonormal  basis  for  these 
subspaces,  respectively.  From  this  definition,  one  can  show  that  the  unit  vector  having  the  least 
component  in  R\  and  Rg  is  given  by 


ti  =  ox(ux  +  vx)  (3.45) 

where  ux  and  vx  are  the  first  principal  vectors  of  Ra  and  Rg  ,  respectively  and  ax  a  scalar  chosen 
such  that  ||tx||  =  1  and  tx  bisecting  the  first  principal  angle  6\.  One  can  show  [37]  that  the  vector 
having  the  least  component  in  La  D  sp<m{tx, . . . , t* — 1}*^"  and  Lb  0  span{ tx, . . . , tj_x}'L  is  given  for 
i  =  2, . . .  ,p  by 


t,-  =  «j(uj  +  v.)  (3.46) 

where  Ui  and  v,-  are  the  i t h  principal  vectors  of  La  and  Lg  ,  respectively  and  a,-  a  scalar  chosen 
such  that  ||t,||  =  1  and  tt  bisecting  the  ith  principal  angle  $i.  We  will  show  in  the  next  section 
that  we  may  actually  have  to  use  vectors  of  the  form  t,'  =  o,Ui  +  (3t v ,  for  some  scalars  and 
to  minimize  the  number  of  transmissions  needed  for  identification.  The  number  of  transmissions 
required  for  identification  is  given  by  p. 

Once  the  target  class  is  identified,  we  can  obtain  a  higher  resolution  target  reflectivity  matrix 
by  sending  the  principal  vectors  (up+x, . . . , ujv)  or  (vp+x, . . . , v^v)  depending  on  the  target. 

After  K  transmissions,  the  target  density  approximation  is  given  by 

TK  =  f:(Tt,)tf  (3.47) 

1  =  1 

where  T  denotes  the  target  and  Tk  its  approximation  after  K  transmissions. 

Equation  (3.40)  has  another  important  implication.  When  E ^  is  small  enough  for  both  target 
classes,  the  components  in  R \  and  Rg  are  small  so  that  for  practical  purposes  their  right  subspaces 
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can  be  considered  to  coincide.  In  terms  of  principal  angles,  if  0(,i  <  N,  is  small  the  total  energy  in 
R-L  and  Rjj  due  to  the  identification  vectors  tx,  —  t,-  will  be  small.  If  i  is  large  enough,  the  excess 
error  energy  is  reduced  and  the  approximation  error  energy  approaches  the  value  given  in  (3.39). 
The  above  suggests  the  following  clustering  criterion.  We  say  that  two  target  densities  belong  to 
the  same  class  if  for  some  i  <  N,  the  ith  principal  angle  between  their  right  subspaces  falls  below 
some  prescribed  threshold  Oth •  Obviously,  the  larger  i,  the  ’’closer”  will  be  the  densities  belonging 
to  the  same  cluster.  In  an  extreme  case,  when  i  =  N,  the  right  subspaces  of  both  classes  would 
coincide. 

As  a  final  note  here,  we  would  like  to  point  out  that  when  clustering  matrices,  each  class  contains 
in  general  more  than  one  matrix.  Although  the  matrices  in  a  particular  class  are  said  to  be  close  , 
a  response  due  to  different  matrices  of  the  same  class  might  lie  in  disjoint  left  subspaces  as  shown 
in  Fig.6. 

3.B.2  Waveform  Selection 


Classes  are  identified  by  the  subspaces  of  the  optimal  probing  vectors.  These  are  the  the 
subspaces  spanned  by  the  right  singular  vectors  corresponding  to  the  N  largest  singular  values. 
The  singular  values  play  no  role  in  this  categorization.  To  see  why  this  is  so,  consider  an  extreme 
case  where  the  right  and  left  subspaces  of  two  target  matrices  coincide  but  have  different  singular 
values.  In  the  clustering  step,  they  would  belong  to  the  same  class  despite  the  difference  in  their 
singular  values. 

For  minimum  excess  error,  we  would  like  to  devise  a  test  to  determine  the  target  by  observing 
the  return  from  a  single  transmission.  For  simplicity,  we  will  consider  first  the  case  when  the 
left  subspaces  of  A  and  B  are  not  intersecting.  Denoting  the  left  space  observation  by  r  and  the 
transmitted  vector  by  x,  we  can  set  up  our  identification  problem  as  a  composite  hypotheses  testing 
problem  that  is  given  by 


H0  (A  target)  :  r  =  A^x  -f  w 

Hi  (B  target)  :  r  =  B^x  +  w 

where  An  and  Bn  are  the  optimal  rank- N  approximations  of  A  and  B,  respectively,  and  w  is  the 
uncertainty  due  to  the  component  in  the  subspaces  R ^  or  Rg-  This  uncertainty  is  present  because 
of  the  absence  of  information  about  the  singular  values  of  A  and  B.  Since  no  stochastic  information 
is  available  about  the  uncertainty,  we  form  the  following  generalized  likelihood  ratio 


A(r)  = 


mxX  ||r  —  Bnx^ 
max 

X 


||r- Anx||2 

The  maximization  leads  to  a  least-square  estimate  of  x  and  the  likelihood  ratio  becomes 


(3.48) 


A,  x  _  ||(/-jBiv(^^)-1^)r||2 

A(r)  ||(J-AJv(A£Aiv)-1Afr)r||2 

ll(J-^)r[|2 

W(i-PaW 
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where  Pa  and  Pb  are  projection  matrices  on  La  and  Lb ,  respectively.  Hence,  our  rule  becomes 
measuring  the  energy  in  L\  and  Ljj  and  deciding  on  the  target  that  has  a  smaller  energy,  i.e.,  that 
is  closer  in  the  2-norm  to  its  respective  left  subspace  assuming  equally  likely  targets. 

We  now  examine  the  conditions  under  which  a  test  based  on  (3.49)  will  provide  correct  decisions. 
We  rewrite  (3.46)  as 


t;  =  cm,-  +  \/l  -  a2 u j-  i  =  l,...,p  (3.50) 

where  u is  a  vector  perpendicular  to  u;  and  lies  in  the  subspace  spanned  by  u,-  and  v,  as  shown 
in  Fig.5.  In  equation  (3.50)  we  allowed  ourselves  the  flexibility  to  control  the  components  of  t,-  in 
Ra  and  in  R\  using  the  scalar  a.  Unlike  (3.46),  t;  need  not  bisect  the  principal  angle  0,. 

Let  the  target  be  A  and  let  Ea  =  a2.  It  can  be  shown  that  for  a  correct  decision  we  have  the 
following  condition  [4] 


1  >Ea> 


1  +  P(A) 


VA 


where 


In  (3.52)  0min  and  0„ 


p(A)  = 


si  n2  8 min  (Jj^A) 


(3.51) 


(3.52) 


sin26maxaj^+1(A) 

are  the  minimum  and  maximum  principle  angles  between  the  left  subspaces 


of  the  target  matrices  A  and  B.  Similarly,  one  can  show  that  if  the  target  is  B  then 


12 Eb  2  Wm  =  VB  “d 

sin^  @rnaxG  %+i  (B) 

where  Eb  is  the  energy  of  t;  in  Rb- 

The  inequalities  in  (3.51)  and  (3.53)  show  that  tj  cannot  be  placed  arbitrarily  between  the 
principal  vectors  and  that  erroneous  decisions  might  result  from  a  choice  such  as  the  one  given 
in  (3.46).  Therefore,  it  might  be  necessary  to  sacrifice  the  reconstruction  performance  of  one 
transmission  to  identify  the  target.  It  is  desirable  to  have  t)a  and  tjb  take  as  small  values  as 
possible  so  that  both  (3.51)  and  (3.53)  are  simultaneously  satisfied.  This  situation  is  depicted  in 
Fig.  7.  It  shows  how  t;  can  vary  without  violating  the  inequalities.  It  shows  also  that  in  this 
case  the  choice  given  in  (3.46)  is  acceptable.  Small  values  for  t]a  and  r\ b  are  obtained  whenever 
<7n(A)/<tn+i(A)  ^  \sin6max/sindmin\  and  <Tjv(i?)/<T/v+ i{B)  ^  \sinOmax/sin9min\.  For  this  to 
occur  it  is  desirable  to  have  a  significant  gap  in  the  distribution  of  the  singular  values  of  both 
target  matrices.  If  the  left  subspaces  intersect,  or  if  8min  is  too  small  so  that  (3.51)  and  (3.53) 
cannot  be  simultaneously  satisfied,  we  have  to  modify  our  strategy.  Specifically,  rather  than  using 
(3.50),  we  let  t;  =  u,.  The  return  will  either  lie  in  LA  if  the  target  is  A,  or  will  have  components 
in  Lb  and  Lg  if  the  target  is  B.  If  the  right  subspaces  are  close,  the  component  in  Lg  could  be 
so  small  that  one  could  consider  the  target  to  be  B.  In  this  case,  at  least  another  transmission 
would  be  necessary  to  determine  the  target.  This  scenario  could  occur  whenever  there  exists  an 
intersection  between  the  left  subspaces  of  the  two  classes.  For  practical  purposes,  the  dimension 
of  this  intersection  is  measured  by  the  number  of  principal  angles  that  falls  below  some  prescribed 
threshold.  If  the  dimension  of  this  intersection  is  p,  then  we  need  at  most  p  transmissions  to  identify 
the  target.  This  is  the  same  p  as  in  (3.46). 


(3.53) 

(3.54) 
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3.C  Waveform  selection  in  the  general  M  >  2  target  class  case 

The  extension  of  the  above  procedure  to  the  case  where  the  target  may  belong  to  one  of 
M  classes  (rather  than  just  two)  is  conceptually  straightforward.  However,  the  implementation 
is  more  complex.  The  added  complexity  is  due  to  the  fact  that  unlike  the  two  class  case,  the 
procedure  typically  takes  longer  to  identify  the  class  to  which  the  target  belongs.  Furthermore, 
selecting  the  waveform  to  be  transmitted  at  any  point  requires  that  we  solve  a  max-min  problem. 
This  max-min  problem  must  be  solved  in  real-time  during  actual  imaging  unless  enough  memory  is 
available  for  storing  all  the  possible  waveform  selections  that  can  be  made  during  actual  imaging. 
We  have  developed  a  procedure  that  solves  the  resulting  max-min  problem  efficiently  by  exploiting 
the  geometry  of  the  libraries  of  waveforms  that  correspond  to  the  M  classes  (c.f.  [3]  for  details). 

Fig.  (8)  shows  the  reduction  in  the  Frobenius  norm  of  the  error  between  an  estimated  target 
reflectivity  function  and  the  actual  target  reflectivity  as  a  function  of  the  number  of  waveforms 
a  typical  experiment.  The  three  curves  compare  the  case  where  we  know  the  class  to  which  the 
target  belongs  (lowest  curve),  where  the  target  is  known  to  belong  to  one  of  two  classes  (middle 
curve)  and  the  case  where  the  target  is  known  to  belong  to  one  of  three  classes  (upper  curve).  Note 
that  as  the  uncertainty  in  class  membership  increase,  we  pay  a  price  for  our  ignorance  in  a  reduced 
reconstruction  quality. 
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4.  “Recent  Developments  in  Wavelet  Theory  and  Applications,”  A.  H.  Tewfik,  invited  pa¬ 
per,  NSF  Sponsored  International  Conference  on  Mathematical  Analysis  and  Signal  Pro¬ 
cessing,  Cairo,  Egypt,  Jan.  1994. 
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11.  “Multiscale  Difference  Equation  Signal  Modeling  and  Analysis”  M.  Ali  and  A.  H.  Tew¬ 
fik,  in  Proc.  of  the  1994  IEEE  Conf.  on  Acoust.  Speech  and  Signal  Proc.,  Adelaide, 
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12.  “Wavelet  Domain  Bearing  Estimation  in  Unknown  Correlated  Noise”  A.  H.  Tewfik,  in 
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13.  “Robust  Multiresolution  Integrated  Target  Sensing  And  Recognition,”  MIT  Lincoln 
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19.  “Multiscale  Difference  Equation  Signal  Modeling  and  Analysis”  M.  Ali  and  A.  H.  Tew¬ 
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and  0.  Gerek,  in  Proc.  of  the  1995  IEEE  Conf.  on  Acoust.  Speech  and  Signal  Proc., 
Detroit,  MI,  May  1995. 
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•  Patents 

None  filed  in  1993-1995. 
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Input  AR  process  pole  at  0.9,  SNR  80dB 


Iterations 


Figure  3:  Plot  showing  the  learning  curves  of  the  proposed  algorithm,  and  the  DCT-LMS  for  a 
system  identification  example.  The  input  signal  is  a  single  pole  AR  process  with  pole  at  0.9. 


Right 

Subspaces 


Figure  4:  Identification  vectors  relative  to  the  right  subspaces. 


Figure  5:  Transmitted  vector  as  a  combination  of  the  iih  principal  vectors. 
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Right  Left 

Subspaces  Subspaces 


Figure  6:  Left  subspace  corresponding  to  two  different  target  classes. 


Figure  7:  Range  of  the  transmitted  vector,  t,-,  for  correct  decision. 
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Figure  8:  Improvement  of  the  reconstructed  D(x,y)  in  a  typical  experiment  as  a  function  of  the 
number  of  waveforms  used  in  three  scenarios:  known  target  class,  two  possible  target  classes  and 
three  possible  target  classes. 
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